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DETAILED ACTION 

This final office action is prepared in response to the applicant's amendments and 
arguments filed on July 19, 2010 as a reply to the non-final office action mailed on April 20, 
2010. 

Claims 17-26, 28-41 and 43-52 were previously pending; 
Claims 18 and 35 have been cancelled; 

Claims 17, 19-21, 24, 28, 31-34, 36, 39, 43, 46-48 and 50-51 have been amended; 
Claims 17, 19-26, 28-34, 36-41 and 43-52 are now pending; 

Response to Arguments 

Applicant's arguments and amendments filed on April 20, 2010 have been carefully 
considered but deemed unpersuasive in view of the rationale presented in the section "Response 
to Arguments" below. 

Accordingly, THIS ACTION IS MADE FINAL. See MPEP 706.07(a). Applicant is 
reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 

1. Regarding amendments to the independent claims 17, 32, 47 and 50, Applicant argued 
references Russell (U.S. 5,526,407) and Yamamoto (U.S. 4,355,338) did not disclose the element 
"producing an index table" that is newly added into the respective independent claims. 

However, Examiner considers Applicant's arguments unpersuasive because Russell 
teaches in Fig. 5 and the specification "tag tables", which anticipates the index table in the claim. 
The claim did not recite any additional information specifically regarding the "index table" to 
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differentiate the index table from the tag table, leaving the scope of index table very broad and 
making Russell's tag table a good example of the index table. 

Furthermore, Russell disclosed data structure in the form of array for storing phrase 
descriptors (Russell, col. 18, lines 57-61), where a phrase descriptor stores information about a 
phrase, such as phrase ID and phrase attributes including the date and time of the start of the 
phrase and the duration of the phrase and the identification of the speaker (Russell, col. 18, lines 
53-56). Examiner considers an array of phrase descriptors to be another example of the "index 
table" in claims 17, 32, 47 and 50. 

Claim Rejections - 35 USC §103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in 
section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are 
such that the subject matter as a whole would have been obvious at the time the invention was made to a person 
having ordinary skill in the art to which said subject matter pertains. Patentability shall not be negatived by the 
manner in which the invention was made. 

2. Claims 17, 20, 26, 32-33, 41, 46 and 50 are rejected under 35 U.S.C. 103(a) as being 
anticipated by Russell et al. (U.S. patent No. 5,526,407, hereinafter "Russell"), in view of 
Yamamoto et al. (U.S. patent No. 4,355, 338, hereinafter "Yamamoto"). 

Regarding claim 17, Russell disclosed a method for digitally recording an analog audio 
signal, the method comprising: 

(a) receiving an analog audio signal (Russell, Fig. 2 and col. 9, line 30 disclosed "analog 
front end", which records analog signal) containing audio information and signal pauses 
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information (a speech signal by nature contains speech information (i.e., audio) and non-speech 
information (i.e. signal pauses), as evident in Russell, col. 7, lines 4-5); 

(b) converting the analog audio signal into digital audio signal comprising audio 
information data and signal pause duration data (Russell, Fig. 2 and col. 9, lines 31-33); 

(c) storing the audio information data of the digital audio signal as information data 
blocks and the signal pause duration data of the digital audio signal as signal pause data blocks 
having different time durations in a memory (Russell, col. 6, lines 43-53 disclosed storing the 
speech stream and structure which represents categorized portions of the speech stream; Russell, 
col. 15 lines 8-1 1 further disclosed that the characterization of speech includes the phrase time 
duration and the presence of pauses), 

wherein each information data block contains an information data block identifier and 
audio information data, and signal pause data block contains a signal pause data block identifier 
and signal pause duration data (Russell, col. 18, lines 2-13, "a data element" and "Phrase ID" or 
col. 18, lines 64-66, "phrase descriptor structure" and "phrase ID"); 

and the audio information data and the signal pause duration data of the resulted digital 
audio signal represent outputs at a normal speaking speed (Russell, col. 7, lines 4-7 and col. 7, 
lines 13-15. In particular, col. 7, lines 13-15 disclosed "When playing the recalled speech, the 
present invention may optionally skip the identified speech pauses and non-speech utterances," 
which implies that the pauses and non-speech utterances represent output at a normal speaking 
speed); 

(d) generating a plurality of audio information data sequences by sequentially reading the 
information data blocks and the signal pause data blocks (Russell, col. 14, lines 43-50 disclosed 
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that a speech process program 136 separate the speech into phrases demarcated by perceptible 
pauses; col. 17, line 67 disclosed "RIFF chunk" as an audio information data sequence), the 
audio information data sequences being separated by the signal pause data blocks if an assigned 
time duration of the signal pause data block is higher than a predetermined time duration (col. 
17, line 64 disclosed that the silence of a specified second duration marks a signal pause). 

(e) producing an index table by sequentially reading the information data blocks and the 
signal pause data blocks (Russell, Fig. 5 and col. 14, lines 43-67 and col. 11, lines 1-17, "tag 
tables" or Fig. 18 and col. 18, lines 53-61, "phrase descriptors"). 

Russell did not explicitly disclose receiving an analog audio signal played at an increased 

speed. 

However, Yamamoto disclosed a method for fast reproduction of music or language 
tapes, where it is know in the prior art that to fast, mass reproduce music or language tapes, the 
source tape can be driven at a high speed (such as 32 times the normal speed) (see Yamamoto, 
col. 1, lines 19-22 and lines 38-45). 

One of ordinary skill in the art would have been motivated to combine Russell and 
Yamamoto because both disclosed system and method for converting an analog signal to a 
digital signal using analog-to-digital converter (Russell, Fig. 2, "Analog Front-end"; Yamamoto, 
Fig. 2, "A/D 9"). 

Therefore, It would have been obvious for one skilled in the art to apply Yamamoto's 
teaching to Russell such that pre-recorded audio signal is loaded into the processing device 43 in 
Fig. 3 at reduced time, allowing the speech processing modules (Fig. 4, "voice print info", 
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"speech extraction", "codec" codec) sufficient time to process the speech and produce output 
without introducing a noticeable delay to the user. The combination yields the highly desirable 
result of providing better user experience. 

Regarding claim 20, the combination of Russell and Yamamoto disclosed the method of 
claim 17. 

Russell further disclosed wherein producing the index table comprises processing the 
sequentially read data blocks (col. 14, lines 43-67 and col. 11, lines 1-17). 

Regarding claim 32, Russell disclosed a method for digitally recording an analog audio 
signal with automatic indexing, the method comprising: 

(a) receiving an analog audio signal containing audio information and signal (Fig. 2 and 
col. 9, line 30 disclosed "analog front end", which records analog signal; a speech signal by 
nature contains speech information (i.e., audio) and non-speech information (i.e. signal pauses), 
as evident in the disclosure in col. 7, lines 4-5); 

(b) converting the analog audio signal into digital audio data comprising audio 
information data and signal pause duration data (Fig. 2 and column 9, lines 31-33); 

(c) storing the converted digital audio data (col. 6, lines 44-45, "stores the speech stream 
in at least a temporary storage"); 

(d) reading the stored digital audio data sequentially (col. 14, lines 48-49 disclosed that 
the speech process program 136 allocates buffers to receive the real-time speech and separate the 



Application/Control Number: 1 0/03 1 ,47 1 Page 7 

Art Unit: 2444 

speech into phrases, which implies that the speech process program 136 must read the real-time 
speech sequentially); 

(e) decoding whether the digital audio data are audio information data or signal pause 
duration data (col. 15, lines 46-53 and col. 17, lines 63-65); 

(f) storing the audio information data as information data blocks and the signal 
pause duration data as signal pause data blocks in a memory (column 6, lines 43-53 disclosed 
storing the speech stream and structure which represents categorized portions of the speech 
stream; col. 15 lines 8-11 further disclosed that the characterization of speech includes the phrase 
time duration and the presence of pauses), 

wherein each information data block contains an information data block identifier and 
audio information data, and signal pause data block contains a signal pause data block identifier 
and signal pause duration data (Russell, col. 18, lines 2-13, "a data element" and "Phrase ID" or 
col. 18, lines 64-66, "phrase descriptor structure" and "phrase ID"); and 

(g) reading the stored data blocks sequentially in order to produce a data structure for 
managing the indexing (col. 14, lines 43-50 disclosed that a speech process program 136 separate 
the speech into phrases demarcated by perceptible pauses; col. 17, line 67 disclosed "RIFF 
chunk" as an audio information data sequence), 

wherein a succession of information data blocks which is not interrupted by a signal 
pause with a pre-determined duration being detected as an audio information data sequence 
whose start and end are stored in the data structure for managing the indexing (col. 17, lines 63- 
64 disclosed that sound of at least a certain first threshold duration followed by silence of a 
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specified second duration indicates that a phrase has been completed; and the time the phrases 
began and ended are recorded). 

(h) producing an index table by sequentially reading the information data blocks and the 
signal pause data blocks (Russell, Fig. 5 and col. 14, lines 43-67 and col. 11, lines 1-17, "tag 
tables" or Fig. 18 and col. 18, lines 53-61, "phrase descriptors"). 

Russell did not explicitly disclose receiving the analog audio signal played at an 
increased speed. 

However, Yamamoto disclosed a method for fast reproduction of music or language 
tapes, where it is know in the prior art that to fast, mass reproduce music or language tapes, the 
source tape can be driven at a high speed (such as 32 times the normal speed) (see Yamamoto, 
col. 1, lines 19-22 and lines 38-45). 

One of ordinary skill in the art would have been motivated to combine Russell and 
Yamamoto because both disclosed system and method for converting an analog signal to a 
digital signal using analog-to-digital converter (Russell, Fig. 2, "Analog Front-end"; Yamamoto, 
Fig. 2, "A/D 9"). 

Therefore, It would have been obvious for one skilled in the art to apply Yamamoto's 
teaching to Russell such that pre-recorded audio signal is loaded into the processing device 43 in 
Fig. 3 at reduced time, allowing the speech processing modules (Fig. 4, "voice print info", 
"speech extraction", "codec" codec) sufficient time to process the speech and produce output 
without introducing a noticeable delay to the user. The combination yields the highly desirable 
result of providing better user experience. 
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Claim 47 is rejected under the same rationale as claim 32 as it lists elements that are all 
listed in claim 32 and disclosed by Russell. 

Claim 50 is rejected under the same rationale as claim 32 as it lists elements that are all 
listed in claim 32 and disclosed by Russell. 

Regarding claims 26 and 41, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 

Russell further disclosed wherein the digital audio data are compressed before storage 
(col. 9, lines 39-43, "codec" and col. 14, lines 66, "to compress the speech"). 

Regarding claim 33, the combination of Russell and Yamamoto disclosed the method of 
claim 32. 

Russell further disclosed wherein the data structure produced for managing the indexing 
is an index table (Fig. 5 and col. 11, lines 1-27). 

3. Claims 19 and 34 are rejected under 35 U.S.C. 103(a) as obvious over Russell and 
Yamamoto, in view of Welch et al. (U.S. Patent No. 4,336,421, hereinafter "Welch"). 

Regarding claims 19 and 34, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 33. 
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Russell further disclosed the start and end of an audio information data sequence are 
stored as start and end address (Russell, Fig. 5 and column 18). 

Russell did not explicitly disclose that the start address and end address are for a first 
address pointer and a second address pointer of the index table. The examiner interprets the first 
address pointer and the second address pointer as the pointer to the start and end of an audio 
information data in the memory, as it is unclear to the examiner what a first address pointer and a 
second address pointer of the index table entails. 

However, in the same field of endeavor, Welch disclosed storing the start and end of a 
speech segment as the start and end addresses in the memory (Welch, col. 13, lines 42-64). 

One of ordinary skill in the art would have been motivated to combine Russell and Welch 
because both disclosed detecting voice sounds and pauses in speech (Russell, "Summary of the 
Invention" and Welch, "Summary of the Invention"), and Welch supplemented Russell's 
teaching with implementation details relating to buffer management for the speech data. 

Therefore, it would have been obvious for one to combine Russell and Welch such that 
Russell's invention can be embodied using techniques taught by Welch. 

4. Claims 21-23, 30, 36-38, 45, 48 and 51 are rejected under 35 U.S.C. 103(a) as obvious 
over Russell and Yamamoto, in view of Freudberg et al. (U.S. Patent No. 4,696,03 1 , hereinafter 
"Freudberg '). 

Regarding claims 21, 36, 48 and 51, the combination of Russell and Yamamoto 
disclosed the method of claims 20, 33, 47 and 50. 
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Russell further disclosed filtering out short spoken utterances that are not useful for the 
user (Russell, col. 16, lines 21-46). 

Russell did not explicitly disclose filtering a particular minimum value for a number of 
information blocks doe not exceed and a particular first time limit value the signal pause of the 
two adjacent signal pause data blocks exceeds. 

However, Freudberg disclosed filtering out short bursts of energy by combining an ON 
time of less than 200 msec with the OFF times of the two adjacent OFF intervals to form a single 
OFF interval, which is essentially the same as what's described in the claim (column 6, lines 32- 
49). 

One would have been motivated to combine Russell and Freudberg because both 
disclosed silence (i.e. pause) detection and speech signal segmentation (Russell, col. 17, lines 60- 
65; Freudberg, col. 3, lines 28-40, "signal-ON" and "signal-OFF"). 

Therefore, it would have been obvious for one to incorporate Freudberg 's method of 
filtering out short energy bursts using threshold values into Russell to filter out information block 
that are falsely detected as speech blocks so as to save system processing time and reduce error 
rate. 

Regarding claims 22 and 37, the combination of Russell, Yamato and Freudberg 
disclosed the method of claims 21 and 36. 

Russell did not explicitly disclose wherein the minimum value is 1 . 

Freudberg disclosed the minimum value for ON time is 200 msec (Freudberg, col. 6, 
lines 32-49). 
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Examiner considers the difference in the minimum value as an implementation choice 
that may vary in different embodiments of the same inventive idea. 

The rationale for the motivation to combine Russell and Freudberg is the same as that 
provided in the rejection of claim 21 . 

Regarding claims 23 and 38, the combination of Russell, Yamamoto and Freudberg 
disclosed the method of claims 21 and 36. 

Russell did not explicitly disclose wherein the first time limit value is 0.5 seconds. 

Freudberg disclosed using an OFF interval (Freudberg, col. 6, lines 50-55). 

Examiner considers the value of 0.5 seconds specified in the claim as an implementation 
choice of Freudberg's OFF interval, which may vary in different embodiments of the same 
inventive idea. 

The rationale for the motivation to combine Russell and Freudberg is the same as that 
provided in the rejection of claim 2 1 . 

Regarding claims 30 and 45, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 

Russell did not explicitly disclose wherein a succession of information data blocks which 
is not separated by a signal pause data block whose signal pause duration data amount to a signal 
pause of more than 2 seconds is detected as an audio information data sequence. 

However, Freudberg disclosed a signal pause detection method using an ON time to 
determine the minimum duration of speech intervals (Freudberg, col. 6, lines 32-49). 
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Examiner considers the time duration of 2 seconds recited in the present claim as an 
implementation choice Freudberg's ON time, which may vary in different embodiments of the 
same inventive concept. 

5. Claims 24-25, 31, 39-40, 46, 49 and 52 are rejected under 35 U.S.C. 103(a) as obvious 
over Russell and Yamamoto, in view of Imai et al. (U.S. 2001/0010037, hereinafter "Imai"). 

Regarding claims 24, 39, 49 and 52, the combination of Russell and Yamamoto 
disclosed the method of claims 20, 33, 48 and the apparatus of claim 50. 

Russell did not explicitly disclose while processing the data, overwriting signal duration 
data of signal pause data blocks whose signal pause duration exceeds a particular second time 
limit value with signal duration data having particular nominal signal duration. 

However, Imai (2001/0010037) disclosed a speech rate conversion system that replaces a 
non-speech interval exceeding a constant continued time with a break of the constant continued 
time that is shorter than the actual non-speech interval (Imai, [0034]). 

One would have been motivated to combine Russell and Imai because both disclosed 
non-speech interval detection. 

Therefore, it would have been obvious for one to add Imai's speech rate conversion to 
Russell to save storage space and adjust playback speed without losing audio information, 

Regarding claims 25 and 40, the combination of Russell, Yamamoto and Imai disclosed 
the method of claims 24 and 39. 
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Imai further disclosed wherein the second time limit value is 10 seconds and the nominal 
signal duration is 2 seconds (Examiner considers the threshold values recited in the claim as an 
implementation choice of Imai's constant continued time disclosed in [0034] which may vary in 
different embodiments of the same inventive idea). 

The rationale for the motivation to combine Russell and Imai is the same as that provided 
in the rejection of claim 24. 

Regarding claims 31 and 46, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 

Russell did not explicitly disclose wherein, when receiving the analog audio signal, a 
playing speed of a data medium on which the analog audio signal is recorded can be set. 

However, Imai disclosed a method for converting speech rate at a preset scaling factor 
using speech interval detection (Imai, "Abstract"). 

The rationale for the motivation to combine Russell and Imai is the same as that provided 
in the rejection of claim 24. 

6. Claims 28-29 and 43-44 are rejected under 35 U.S.C. 103(a) as obvious over Russell and 
Yamamoto, in view of Gan et al. (IEEE Publication "Implementation of Silence Compression 
Scheme For G.723.1 Speech Coder Using TI TMS320S75 DSP Chip", 1997, hereinafter "Gan"). 

Regarding claims 28 and 43, the combination of Russell and Yamamoto disclosed the 
method of claims 17 and 32. 
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Russell did not explicitly disclose wherein all the data blocks are of a same size and 
correspond to a particular basic unit of duration. 

However, Gan disclosed an implementation of silence compression for G. 723.1 coder, 
wherein G.723. 1 coder has frame size of 30 ms and compressed frame size of 24 bytes at 6.3 
kb/s coding rate. 

One would have been motivated to combine Russell and Gan because both disclosed 
silence detection and codec. 

It would have been obvious for one to incorporate Gan's silence compression and 
G.723. 1 into Russell as one of the possible embodiments of Russell's speech peripheral. 

Regarding claims 29 and 44, the combination of Russell, Yamamoto and Gan disclosed 
the method of claims 28 and 43. 

Russell did not explicitly disclose wherein the basic unit of duration is 30 ms. 

However, as already addressed above in the rejection of claim 28, Gan disclosed a silence 
compression implementation for G.723. 1 codec, where the G.723. 1. codec has basic frame 
duration of 30 ms. 

The rationale for the motivation to combine Russell and Gan is the same as that provided 
in the rejection of claim 28. 

Conclusion 

THIS ACTION IS FINAL. Applicant is reminded of the extension of time policy as set forth in 
37 CFR 1.136(a). 
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A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within TWO 
MONTHS of the mailing date of this final action and the advisory action is not mailed until after 
the end of the THREE-MONTH shortened statutory period, then the shortened statutory period 
will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 
1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, 
will the statutory period for reply expire later than SIX MONTHS from the mailing date of this 
final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to SHIRLEY X. ZHANG whose telephone number is (571)270- 
5012. The examiner can normally be reached on Monday through Friday 8:00am - 5:30pm EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, William Vaughn can be reached on (571) 272-3922. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/S.X.Z./ Art Unit 2444 
9/13/2010 

/William C. Vaughn, Jr./ 

Supervisory Patent Examiner, Art Unit 2444 



